Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games

نویسندگان

  • Julien Pérolat
  • Bruno Scherrer
  • Bilal Piot
  • Olivier Pietquin
چکیده

This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in Lp-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteratio,n). We show that we can achieve a stationary policy which is 2γ + ′ (1−γ)2 -optimal, where is the value function approximation error and ′ is the approximate greedy operator error. In addition, we provide a practical algorithm (AGPI-Q) to solve infinite horizon γ-discounted two-player zero-sum Stochastic Games in a batch setting. It is an extension of the Fitted-Q algorithm (which solves Markov Decisions Processes from data) and can be non-parametric. Finally, we demonstrate experimentally the performance of AGPIQ on a simultaneous two-player game, namely Alesia.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A TRANSITION FROM TWO-PERSON ZERO-SUM GAMES TO COOPERATIVE GAMES WITH FUZZY PAYOFFS

In this paper, we deal with games with fuzzy payoffs. We proved that players who are playing a zero-sum game with fuzzy payoffs against Nature are able to increase their joint payoff, and hence their individual payoffs by cooperating. It is shown that, a cooperative game with the fuzzy characteristic function can be constructed via the optimal game values of the zero-sum games with fuzzy payoff...

متن کامل

The averaging principle for perturbations of continuous timecontrol problems with fast controlled jump parametersRachid

We consider a class of singularly perturbed zero-sum diier-ential games with piecewise deterministic dynamics, where the changes from one structure (for the dynamics) to another are governed by a nite-state Markov process. Player 1 controls the continuous dynamics, whereas Player 2 controls the rate of transition for the nite-state Markov process; both have access to the states of both processe...

متن کامل

On the equivalence of two expected average reward criteria for zero-sum semi-Markov games

In this paper we study two basic optimality criteria used in the theory of zero-sum semi-Markov games. According to the first one, the average reward for player 1 is the lim sup of the expected total rewards over a finite number of jumps divided by the expected cumulative time of these jumps. According to the second definition, the average reward (for player 1) is the lim sup of the expected to...

متن کامل

The Turnpike Property for Dynamic Discrete Time Zero-sum Games

We consider a class of dynamic discrete-time two-player zero-sum games. We show that for a generic cost function and each initial state, there exists a pair of overtaking equilibria strategies over an infinite horizon. We also establish that for a generic cost function f , there exists a pair of stationary equilibria strategies (xf ,yf ) such that each pair of “approximate” equilibria strategie...

متن کامل

Zero-sum constrained stochastic games with independent state processes

We consider a zero-sum stochastic game with side constraints for both players with a special structure. There are two independent controlled Markov chains, one for each player. The transition probabilities of the chain associated with a player as well as the related side constraints depend only on the actions of the corresponding player; the side constraints also depend on the player’s controll...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015